Skip to content

feat: implement cronjob based sitemap automation#374

Open
amaan-bhati wants to merge 12 commits intomainfrom
sitemap-automation-cron
Open

feat: implement cronjob based sitemap automation#374
amaan-bhati wants to merge 12 commits intomainfrom
sitemap-automation-cron

Conversation

@amaan-bhati
Copy link
Copy Markdown
Member

@amaan-bhati amaan-bhati commented Apr 14, 2026

Replaced the runtime sitemap with a committed static sitemap and adds two workflows:

  • sync-sitemap.yml: regenerates public/sitemap.xml from WordPress and opens/updates a PR if it changed
  • submit-google-sitemap.yml: submits https://keploy.io/blog/sitemap.xml to Google Search Console after sitemap changes land on main

What Changed

  • Added scripts/generate-sitemap.mjs
    • fetches all sitemap-worthy posts from WPGraphQL
    • rebuilds the full sitemap from scratch
    • writes public/sitemap.xml
  • Added public/sitemap.xml as the versioned sitemap source of truth
  • Added .github/workflows/sync-sitemap.yml
  • Added .github/workflows/submit-google-sitemap.yml
  • Removed the runtime sitemap route: pages/sitemap.xml.tsx
  • Added explicit cache headers in next.config.js
    • /sitemap.xml: 5 minutes
    • /robots.txt: 24 hours

Why

  • keeps sitemap content reviewable in git
  • avoids runtime dependency on WordPress for sitemap serving
  • avoids stale-public-sitemap issues right after merge
  • ties Google submission to reviewed sitemap changes on main

Flow

Sync sitemap

  1. Workflow runs daily or manually.
  2. node scripts/generate-sitemap.mjs fetches WordPress posts and rewrites public/sitemap.xml.
  3. If git diff shows changes, the workflow creates or updates a PR on automation/sync-sitemap.

Submit to Google

  1. A sitemap PR is merged to main.
  2. submit-google-sitemap.yml runs when public/sitemap.xml changes.
  3. The workflow exchanges the Google refresh token for an access token.
  4. It submits https://keploy.io/blog/sitemap.xml to Search Console.

Key Details

  • lastmod is normalized to YYYY-MM-DD
  • posts missing modified fall back to the latest valid post lastmod across the sitemap
  • the sync workflow runs the generator directly with Node and does not install dependencies

Required Config

Generator / sync workflow

  • WORDPRESS_API_URL
    • workflow default: https://wp.keploy.io/graphql
    • can be overridden with GitHub repo var WORDPRESS_API_URL

Google submission secrets

  • GOOGLE_CLIENT_ID
  • GOOGLE_CLIENT_SECRET
  • GOOGLE_REFRESH_TOKEN
  • GOOGLE_SEARCH_CONSOLE_SITE_URL

Example:

GOOGLE_SEARCH_CONSOLE_SITE_URL=sc-domain:keploy.io

Validation

  • generated the sitemap from the live WPGraphQL endpoint
  • verified XML with xmllint
  • ran npm run lint
  • ran npm run build

Sources

External:

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copilot AI review requested due to automatic review settings April 14, 2026 16:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements cron-based sitemap generation and publication by moving sitemap creation out of the Next.js runtime and into scheduled GitHub Actions, with optional Google Search Console submission on updates.

Changes:

  • Add scripts/generate-sitemap.mjs to generate public/sitemap.xml from WPGraphQL.
  • Replace SSR /sitemap.xml handler by committing a static public/sitemap.xml.
  • Add GitHub Actions workflows to (1) periodically sync public/sitemap.xml via PR and (2) submit the sitemap to Google on main updates.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/generate-sitemap.mjs New Node script to fetch WP posts and build/write public/sitemap.xml.
public/sitemap.xml New committed sitemap output (generated content).
pages/sitemap.xml.tsx Removed SSR sitemap endpoint implementation.
package.json Adds generate:sitemap script entry.
.github/workflows/sync-sitemap.yml Scheduled job to regenerate sitemap and open/update a PR when it changes.
.github/workflows/submit-google-sitemap.yml Submits sitemap to Google Search Console when public/sitemap.xml changes on main.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/sync-sitemap.yml Outdated
Comment thread scripts/generate-sitemap.mjs Outdated
Comment thread scripts/generate-sitemap.mjs
Comment thread scripts/generate-sitemap.mjs Outdated
Comment thread .github/workflows/sync-sitemap.yml Outdated
Comment thread .github/workflows/sync-sitemap.yml
…view

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copilot AI review requested due to automatic review settings April 14, 2026 17:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate-sitemap.mjs
Comment thread .github/workflows/sync-sitemap.yml Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate-sitemap.mjs Outdated
Comment thread .github/workflows/submit-google-sitemap.yml Outdated
Comment thread .github/workflows/submit-google-sitemap.yml Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate-sitemap.mjs Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate-sitemap.mjs Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
@amaan-bhati amaan-bhati reopened this Apr 15, 2026
@amaan-bhati amaan-bhati requested a review from Copilot April 15, 2026 06:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/submit-google-sitemap.yml
Comment thread .github/workflows/submit-google-sitemap.yml Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/submit-google-sitemap.yml
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@nehagup nehagup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to create a new PR for sitemap changes. But we can trigger the workflow on request for sitemap edition on the same PR. So once it's merged it's part of the same PR.

@nehagup
Copy link
Copy Markdown
Member

nehagup commented Apr 15, 2026

Implementation wise, great direction.

  1. The fallback lastmod is misleading.
  scripts/generate-sitemap.mjs:199:
  const fallbackLastmod = latestOverall ?? new Date().toISOString().split("T")[0];
  for (const entry of missingLastmodEntries) {
    entry.lastmod = fallbackLastmod;
  }

If a single post has no modified field, the script assigns it the latest date of any post in the entire
feed. So a 2020 post that's missing modified would get tagged as updated yesterday, which is a false
freshness signal Google will eventually distrust. Better: fall back to the post's own date (publish
date), and only fall back to "today" if both modified AND date are absent. The query needs to add date to
the GraphQL selection.

  1. If wp.keploy.io/graphql is down, rate-limited, or returns errors, the workflow fails silently. Add a step
    on if: failure() that posts to Slack/Discord (#qa-alerts or wherever) so you find out the same day
    instead of when Google Search Console starts complaining about a stale sitemap a week later. Use the
    existing Slack webhook if you have one.

  2. The hand-rolled JWT signing is fragile.
    .github/workflows/submit-google-sitemap.yml:38–99 is 60 lines of bash + python + openssl to sign a JWT,
    exchange it for an OAuth token, and call the Webmasters API. It works, but a Node.js step using the
    official googleapis package is ~10 lines and dramatically easier to debug:

  • run: |
    node -e "
    const {google} = require('googleapis');
    const auth = new google.auth.JWT(
    process.env.GOOGLE_SERVICE_ACCOUNT_EMAIL, null,
    process.env.GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY,
    ['https://www.googleapis.com/auth/webmasters']
    );
    const sc = google.webmasters({version:'v3', auth});
    sc.sitemaps.submit({
    siteUrl: process.env.GOOGLE_SEARCH_CONSOLE_SITE_URL,
    feedpath: process.env.GOOGLE_SITEMAP_URL
    }).then(()=>console.log('submitted'));
    "
    env: { ... }
    Optional, but I'd push for it — fewer moving parts, better error messages.
  1. remove the existing next-sitemap automation
  2. keep the priority to 0.7
  3. No image:image markup in the sitemap. Every blog post has a featured image. Adding image sitemap
    entries (xmlns:image="http://www.google.com/schemas/sitemap-image/0.9") is a free image-search visibility
    boost. ~10 lines of extra code in buildSitemapXml.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copilot AI review requested due to automatic review settings April 15, 2026 21:36
@amaan-bhati
Copy link
Copy Markdown
Member Author

Implementation wise, great direction.

  1. The fallback lastmod is misleading.
  scripts/generate-sitemap.mjs:199:
  const fallbackLastmod = latestOverall ?? new Date().toISOString().split("T")[0];
  for (const entry of missingLastmodEntries) {
    entry.lastmod = fallbackLastmod;
  }

If a single post has no modified field, the script assigns it the latest date of any post in the entire feed. So a 2020 post that's missing modified would get tagged as updated yesterday, which is a false freshness signal Google will eventually distrust. Better: fall back to the post's own date (publish date), and only fall back to "today" if both modified AND date are absent. The query needs to add date to the GraphQL selection.

  1. If wp.keploy.io/graphql is down, rate-limited, or returns errors, the workflow fails silently. Add a step
    on if: failure() that posts to Slack/Discord (#qa-alerts or wherever) so you find out the same day
    instead of when Google Search Console starts complaining about a stale sitemap a week later. Use the
    existing Slack webhook if you have one.
  2. The hand-rolled JWT signing is fragile.
    .github/workflows/submit-google-sitemap.yml:38–99 is 60 lines of bash + python + openssl to sign a JWT,
    exchange it for an OAuth token, and call the Webmasters API. It works, but a Node.js step using the
    official googleapis package is ~10 lines and dramatically easier to debug:
  • run: |
    node -e "
    const {google} = require('googleapis');
    const auth = new google.auth.JWT(
    process.env.GOOGLE_SERVICE_ACCOUNT_EMAIL, null,
    process.env.GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY,
    ['https://www.googleapis.com/auth/webmasters']
    );
    const sc = google.webmasters({version:'v3', auth});
    sc.sitemaps.submit({
    siteUrl: process.env.GOOGLE_SEARCH_CONSOLE_SITE_URL,
    feedpath: process.env.GOOGLE_SITEMAP_URL
    }).then(()=>console.log('submitted'));
    "
    env: { ... }
    Optional, but I'd push for it — fewer moving parts, better error messages.
  1. remove the existing next-sitemap automation
  2. keep the priority to 0.7
  3. No image:image markup in the sitemap. Every blog post has a featured image. Adding image sitemap
    entries (xmlns:image="http://www.google.com/schemas/sitemap-image/0.9") is a free image-search visibility
    boost. ~10 lines of extra code in buildSitemapXml.

Thanks @nehagup for the review and helping me out to optimise the solution, I have gone through all the points mentioned and addressed them:

1. Fallback lastmod - Removed the batch missingLastmodEntries approach. Each post now resolves modified → date (publish date) → today inline. Added date to the GraphQL selection. latestOverall no longer gets assigned to individual posts - only used for static entries (/blog, category pages).

2. Failure notifications - Acknowledged, skipping for now, will revisit when we have discussed the webhook set up and implementation.

3. Hand-rolled JWT - Replaced the 60-line bash/python/openssl block with the googleapis Node.js package. Added checkout + setup-node + npm install --no-save googleapis. Private key \n normalisation handled inline (key.replace(/\\n/g, '\n')). Validation step kept for clear error messages on missing secrets.

4. next-sitemap - Confirmed it is not in the codebase; nothing to remove.

5. Priority — Updated to 0.70.

6. Image sitemap - Implemented. Added featuredImage { node { sourceUrl altText } } to the GraphQL query. Each post with a featured image gets an <image:image> block; xmlns:image namespace only injected when at least one image is present. Validated against the live WordPress API:

curl -sS -X POST "https://wp.keploy.io/graphql" \
  -H "Content-Type: application/json" \
  -d '{"query":"{ posts(first: 50, where: { orderby: { field: MODIFIED, order: DESC } }) { edges { node { slug featuredImage { node { sourceUrl altText } } } } } }"}'

50/50 posts returned absolute https:// sourceUrl values with populated altText - no null images, no relative URLs.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate-sitemap.mjs
Comment thread scripts/generate-sitemap.mjs
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants